Customer Churn Prediction in the Telecom Industry: A Machine Learning Approach ๐Ÿ“‰ยถ

Customer churnโ€”the discontinuation of a companyโ€™s servicesโ€”poses a major challenge in the telecom industry. With annual churn rates between 15-25%, reducing customer attrition is a strategic priority, as retaining existing customers is far more cost-effective than acquiring new ones.

Objectives ๐ŸŽฏยถ

This analysis aims to:

  • ๐Ÿ” Explore churn patterns across customer demographics, service types, and usage behavior.
  • ๐Ÿ“Š Identify key factors contributing to churn by analyzing correlations and feature importance.
  • ๐Ÿค– Build and evaluate predictive models, including Logistic Regression, Decision Trees, K-Nearest Neighbors (KNN), and Ensemble Methods.
  • ๐Ÿ“‰ Compare model performance using evaluation metrics to determine the most effective approach for churn prediction.

By leveraging machine learning, this study provides insights into customer churn trends, helping telecom companies identify high-risk customers and improve retention efforts.

๐Ÿ“‚ Dataset: customer_churn_dataset.csv
๐Ÿ“ˆ Evaluation Metrics: Precision, Recall, F1-score, and ROC-AUC

Inย [2]:
## REQUIRED LIBRARIES

# For data wrangling
import pandas as pd
import numpy as np

# For visualization
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
from plotly.offline import init_notebook_mode

# For preprocessing and modeling
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestClassifier

#Model building
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, classification_report, confusion_matrix, ConfusionMatrixDisplay

init_notebook_mode(connected=True)

๐Ÿ“Š Exploratory Data Analysis (EDA)ยถ

  • Understanding the dataset distribution
  • Checking for missing values and outliers
  • Identifying feature correlations
  • Finding key patterns in churn behavior
Inย [4]:
df = pd.read_csv('customer_churn_dataset.csv')
df.head(2)
Out[4]:
customerID gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity OnlineBackup Churn
0 Cust_1 Male 0.0 Yes No 2.0 Yes No NaN No No internet service 1
1 Cust_2 Female 1.0 No No NaN Yes No Fiber optic Yes Yes 0
Inย [7]:
df.shape
Out[7]:
(10000, 12)
Inย [9]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 12 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   customerID       10000 non-null  object 
 1   gender           9000 non-null   object 
 2   SeniorCitizen    9000 non-null   float64
 3   Partner          9000 non-null   object 
 4   Dependents       9000 non-null   object 
 5   tenure           9000 non-null   float64
 6   PhoneService     9000 non-null   object 
 7   MultipleLines    9000 non-null   object 
 8   InternetService  9000 non-null   object 
 9   OnlineSecurity   9000 non-null   object 
 10  OnlineBackup     9000 non-null   object 
 11  Churn            10000 non-null  int64  
dtypes: float64(2), int64(1), object(9)
memory usage: 937.6+ KB
Inย [11]:
df.isnull().sum()
Out[11]:
customerID            0
gender             1000
SeniorCitizen      1000
Partner            1000
Dependents         1000
tenure             1000
PhoneService       1000
MultipleLines      1000
InternetService    1000
OnlineSecurity     1000
OnlineBackup       1000
Churn                 0
dtype: int64
Inย [13]:
churned_out_color = '#B71C1C'
active_customers_color = '#00BFA5'
Inย [15]:
# Data Visualization and Exploration 
# Prepare the data
labels = ['Churned Out', 'Active Customers']
sizes = [df.Churn[df['Churn'] == 1].count(), df.Churn[df['Churn'] == 0].count()]
print(sizes)

# Create the pie chart
fig = px.pie(
    names=labels,
    values=sizes,
    title="Proportion of Customers Churned out and Active Customers",
    hole=0.0,  # For a standard pie chart; set hole=0.5 for a donut chart
)

# Optional: Tuning visual appearance
fig.update_traces(
    pull=[0, 0.05],  # Pulls the 'Retained' slice out slightly, similar to "explode"
    textinfo='percent+label',  # Show percentage and label together
    hoverinfo='label+percent+value',  # Hover information
    marker=dict(line=dict(color='black', width=0.5),colors=[churned_out_color, active_customers_color]),  # Customize marker line
)

# Adjust the layout to set the width and height
fig.update_layout(
    width=800,  # Set desired width (e.g., 600 pixels)
    height=500  # Set desired height (e.g., 400 pixels)
)


# Show the chart
fig.show()
[5020, 4980]
Inย [17]:
# Prepare data for analysis and exploration
# - Create a copy of the original DataFrame for exploratory data analysis (EDA)
# - Remove the 'customerID' column as it is irrelevant for modeling
# - Map categorical values in 'Churn' and 'SeniorCitizen' columns to more meaningful labels
#   for better readability and interpretation

df_copy = df.copy()

# Drop the customerID column
if 'customerID' in df.columns:
    df = df.drop(columns=['customerID'])

# Drop the customerID column
if 'customerID' in df_copy.columns:
    df_copy = df_copy.drop(columns=['customerID'])

# Map the Churn column to the desired labels in the copy
df_copy['Churn'] = df_copy['Churn'].map({0: 'Active Customers', 1: 'Churned Out'})
df_copy['SeniorCitizen'] = df_copy['SeniorCitizen'].map({0: 'Non-Senior Citizen', 1: 'Senior Citizen'})
Inย [19]:
#Gender
fig = px.histogram(df_copy,
                   x='gender',
                   color='Churn',
                   title='Churn Rate by Gender',
                   barmode='group',
                   color_discrete_sequence=[churned_out_color,active_customers_color],)

fig.update_layout(xaxis_title='Active Customers vs Churned out', yaxis_title='Count', width=800, height=400)
fig.show()
Inย [21]:
#SeniorCitizen
fig = px.histogram(df_copy,
                   x='SeniorCitizen',
                   color='Churn',
                   title='Churn Rate by Senior Citizen',
                   barmode='group',
                   color_discrete_sequence=[churned_out_color,active_customers_color],)

fig.update_layout(xaxis_title='Active Customers vs Churned out', yaxis_title='Count', width=800, height=400)
fig.show()
Inย [23]:
#Partner
fig = px.histogram(df_copy,
                   x='Partner',
                   color='Churn',
                   title='Churn Rate by Partner',
                   barmode='group',
                   color_discrete_sequence=[churned_out_color,active_customers_color],)

fig.update_layout(xaxis_title='Active Customers vs Churned out', yaxis_title='Count', width=800, height=400)
fig.show()
Inย [25]:
#Dependents
fig = px.histogram(df_copy,
                   x='Dependents',
                   color='Churn',
                   title='Churn Rate by Dependents',
                   barmode='group',
                   color_discrete_sequence=[churned_out_color,active_customers_color],)

fig.update_layout(xaxis_title='Active Customers vs Churned out', yaxis_title='Count', width=800, height=400)
fig.show()
Inย [27]:
#PhoneService
fig = px.histogram(df_copy,
                   x='PhoneService',
                   color='Churn',
                   title='Churn Rate by PhoneService',
                   barmode='group',
                   color_discrete_sequence=[churned_out_color,active_customers_color],)

fig.update_layout(xaxis_title='Active Customers vs Churned out', yaxis_title='Count', width=800, height=400)
fig.show()
Inย [29]:
#MultipleLines
fig = px.histogram(df_copy,
                   x='MultipleLines',
                   color='Churn',
                   title='Churn Rate by MultipleLines',
                   barmode='group',
                   color_discrete_sequence=[churned_out_color,active_customers_color],)

fig.update_layout(xaxis_title='Active Customers vs Churned out', yaxis_title='Count', width=800, height=400)
fig.show()
Inย [31]:
#InternetService
fig = px.histogram(df_copy,
                   x='InternetService',
                   color='Churn',
                   title='Churn Rate by InternetService',
                   barmode='group',
                   color_discrete_sequence=[churned_out_color,active_customers_color],)

fig.update_layout(xaxis_title='Active Customers vs Churned out', yaxis_title='Count', width=800, height=400)
fig.show()
Inย [33]:
#OnlineSecurity
fig = px.histogram(df_copy,
                   x='OnlineSecurity',
                   color='Churn',
                   title='Churn Rate by OnlineSecurity',
                   barmode='group',
                   color_discrete_sequence=[churned_out_color,active_customers_color],)

fig.update_layout(xaxis_title='Active Customers vs Churned out', yaxis_title='Count', width=800, height=400)
fig.show()
Inย [35]:
#OnlineBackup
fig = px.histogram(df_copy,
                   x='OnlineBackup',
                   color='Churn',
                   title='Churn Rate by OnlineBackup',
                   barmode='group',
                   color_discrete_sequence=[churned_out_color,active_customers_color],)

fig.update_layout(xaxis_title='Active Customers vs Churned out', yaxis_title='Count', width=800, height=400)
fig.show()
Inย [37]:
#tenure
# Group and aggregate data
grouped_data = df_copy.groupby(['tenure', 'Churn']).size().reset_index(name='Customer Count')

# Create the line chart
fig = px.line(
    grouped_data,
    x='tenure',
    y='Customer Count',
    color='Churn',
    title='Churn Rate by Tenure',
    color_discrete_sequence=[active_customers_color,churned_out_color]
)

# Update layout for better labels
fig.update_layout(
    xaxis_title='Tenure',
    yaxis_title='Customer Count',
    legend_title='Churn Status',

)

# Show the figure
fig.show()

๐Ÿ” Key Observations from Customer Churn Analysisยถ

We note the following insights from the visualizations:

๐Ÿ“Œ Churn Rate is Nearly 50%

  • The dataset contains 5,020 churned customers and 4,980 non-churned customers, making churn prediction an important task.

๐Ÿ“ˆ Most Features Show Similar Distributions

  • Gender, Partner, Dependents, PhoneService, MultipleLines, OnlineSecurity, and OnlineBackup all have nearly equal proportions between churned and non-churned customers.
  • This suggests that these individual features alone are not strong predictors of churn.

๐Ÿ“„ Tenure Shows a Clear Pattern

  • Customers with shorter tenure (0-20 months) exhibit higher churn rates, indicating that early-stage customers are more likely to leave.
  • Churn fluctuates but stabilizes beyond 30 months, though there are intermittent spikes.
  • Understanding contract renewals, pricing changes, or service issues at these peaks can provide deeper insights.
  • Retention strategies should focus on early-tenure customers, potentially through personalized offers or improved onboarding.
Inย [ย ]:
# Drop rows with missing values
df_copy = df_copy.dropna()

# Encode categorical variables
label_encoders = {}
for column in df_copy.select_dtypes(include=['object']).columns:
    le = LabelEncoder()
    df_copy[column] = le.fit_transform(df_copy[column])
    label_encoders[column] = le

# Compute the correlation matrix
correlation_matrix = df_copy.corr()

# Plot the heatmap
plt.figure(figsize=(10, 5))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()

๐Ÿ“Š Understanding the Correlation Matrixยถ

The correlation matrix shows how features relate to each other and to churn. Key takeaways:

๐Ÿ“Œ No Strong Correlation with Churn

  • All features have low correlation values with churn, meaning no single feature alone is a strong predictor.
  • Tenure shows a slight negative correlation, indicating that customers with longer tenure are less likely to churn.

๐Ÿ“Œ Minimal Multicollinearity

  • No two features are highly correlated, meaning redundant features are unlikely.
  • This suggests feature interactions might be more important than individual features.

๐Ÿ” Why Analyze Feature Importance?ยถ

Since correlation alone doesn't tell us how much each feature contributes to churn, we need to evaluate feature importance:

โœ… Identify which features have the most impact on predictions.
โœ… Go beyond simple correlations by capturing non-linear relationships.
โœ… Prioritize key factors to improve churn modeling and business strategies.

To achieve this, we use RandomForestClassifier, which ranks features based on their contribution to decision-making. This helps confirm whether features like tenure and contract type are indeed the strongest predictors.

Inย [40]:
# Preprocess the data
df_copy = df_copy.dropna()  # Drop rows with missing values
label_encoders = {}
for column in df_copy.select_dtypes(include=['object']).columns:
    le = LabelEncoder()
    df_copy[column] = le.fit_transform(df_copy[column])
    label_encoders[column] = le

# Split the data into features and target
X = df_copy.drop('Churn', axis=1)
y = df_copy['Churn']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a random forest classifier
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)

# Get feature importance
feature_importance = pd.Series(clf.feature_importances_, index=X.columns).sort_values(ascending=False)

# Print feature importance
print(feature_importance)
tenure             0.429750
OnlineSecurity     0.092899
OnlineBackup       0.086982
MultipleLines      0.082934
InternetService    0.081157
Partner            0.052440
gender             0.049084
PhoneService       0.045417
SeniorCitizen      0.044344
Dependents         0.034993
dtype: float64

โšก Why Create Baseline Models?ยถ

Before building a complex model, it's essential to establish baseline performance using simpler models. This helps in:

โœ… Setting a reference point โ€“ Helps measure improvement when testing more advanced models.
โœ… Identifying initial patterns โ€“ Even simple models can highlight key predictive features.
โœ… Balancing interpretability and performance โ€“ Decision Trees and Logistic Regression provide insight into feature importance and separability.


๐Ÿ“Œ Baseline Models: Decision Tree & Logistic Regressionยถ

To create a solid starting point, we train two different models:

1๏ธโƒฃ Decision Tree Classifier

  • Captures non-linear relationships and feature interactions.
  • Helps identify key decision-making splits for churn prediction.

2๏ธโƒฃ Logistic Regression

  • A simple, interpretable model that provides probabilities of churn.
  • Acts as a benchmark to compare against more complex models.

๐Ÿ”Ž Key Metrics Evaluatedยถ

We evaluate both models using:

  • Accuracy โ€“ Overall correctness.
  • Precision โ€“ How many predicted churns were correct.
  • Recall โ€“ How many actual churn cases were detected.
  • F1 Score โ€“ A balance of precision and recall.

These baselines allow us to compare future models and ensure that advanced techniques actually provide real improvements over simpler methods. ๐Ÿš€

Inย [43]:
#Creating baseline models
# Preprocess the data (assuming df_copy is already preprocessed and ready)
# Split the data into features and target
x = df_copy.drop('Churn', axis=1)
y = df_copy['Churn']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Train a Decision Tree classifier
dt_clf = DecisionTreeClassifier(random_state=42, criterion='entropy', max_depth=5, )
dt_clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = dt_clf.predict(X_test)

# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1_score_baseline_dt = f1_score(y_test, y_pred)
print(f'Accuracy of the DecisionTreeClassifier model: {accuracy:.3f}')
print(f'Precision of the DecisionTreeClassifier model: {precision:.3f}')
print(f'Recall of the DecisionTreeClassifier model: {recall:.3f}')
print(f'F1 Score of the DecisionTreeClassifier model: {f1_score_baseline_dt:.3f}')
Accuracy of the DecisionTreeClassifier model: 0.514
Precision of the DecisionTreeClassifier model: 0.488
Recall of the DecisionTreeClassifier model: 0.413
F1 Score of the DecisionTreeClassifier model: 0.447
Inย [45]:
#Creating baseline models

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Preprocess the data (assuming df_copy is already preprocessed and ready)
# Split the data into features and target
x = df_copy.drop('Churn', axis=1)
y = df_copy['Churn']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Train a Logistic Regression classifier
lr_clf = LogisticRegression(random_state=42, max_iter=500)
lr_clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = lr_clf.predict(X_test)

# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1_score_baseline_lr = f1_score(y_test, y_pred)
print(f'Accuracy of the Logistic Regression model: {accuracy:.3f}')
print(f'Precision of the Logistic Regression model: {precision:.3f}')
print(f'Recall of the Logistic Regression model: {recall:.3f}')
print(f'F1 Score of the Logistic Regression model: {f1_score_baseline_lr:.3f}')
Accuracy of the Logistic Regression model: 0.559
Precision of the Logistic Regression model: 0.543
Recall of the Logistic Regression model: 0.461
F1 Score of the Logistic Regression model: 0.498

๐Ÿ› ๏ธ Data Cleaning & Preprocessingยถ

  • Handling missing values
  • Encoding categorical variables
  • Feature selection and scaling
Inย [47]:
# Filling up the missing values

#Gender
missing_gender_percent = df['gender'].isnull().sum() / len(df) * 100
print(f"Missing Gender Values: {missing_gender_percent:.2f}%")
df.loc[df['gender'].isnull(), 'gender'] = "Unknown"

#Senior Citizen
missing_senior_citizen_percent = df['SeniorCitizen'].isnull().sum() / len(df) * 100
print(f"Missing SeniorCitizen Values: {missing_senior_citizen_percent:.2f}%")
senior_dist = df['SeniorCitizen'].value_counts(normalize=True)
df.loc[df['SeniorCitizen'].isnull(), 'SeniorCitizen'] = np.random.choice([0.0, 1.0], p=senior_dist.values)

#Partner
missing_partner = df['Partner'].isnull().sum() / len(df) * 100
print(f"Missing Partner Values: {missing_partner:.2f}%")
partner_dist = df['Partner'].value_counts(normalize=True)
df.loc[df['Partner'].isnull(), 'Partner'] = np.random.choice(['Yes', 'No'], p=partner_dist.values)

#Dependents
missing_dependents = df['Dependents'].isnull().sum() / len(df) * 100
print(f"Missing Dependents Values: {missing_dependents:.2f}%")
dependent_dist = df['Dependents'].value_counts(normalize=True)
df.loc[df['Dependents'].isnull(), 'Dependents'] = np.random.choice(['Yes', 'No'], p=dependent_dist.values)

#Tenure
missing_tenure = df['tenure'].isnull().sum() / len(df) * 100
print(f"Missing Tenure Values: {missing_tenure:.2f}%")
df.loc[df['tenure'].isnull(), 'tenure'] = df['tenure'].median()

#Phone Service
missing_phone_service = df['PhoneService'].isnull().sum() / len(df) * 100
print(f"Missing PhoneService Values: {missing_phone_service:.2f}%")
phone_service_dist = df['PhoneService'].value_counts(normalize=True)
df.loc[df['PhoneService'].isnull(), 'PhoneService'] = np.random.choice(['Yes', 'No'], p=phone_service_dist.values)

#Multiple Lines
missing_multiple_lines = df['MultipleLines'].isnull().sum() / len(df) * 100
print(f"Missing MultipleLines Values: {missing_multiple_lines:.2f}%")
multiple_lines_dist = df['MultipleLines'].value_counts(normalize=True)
df.loc[df['MultipleLines'].isnull(), 'MultipleLines'] = np.random.choice(multiple_lines_dist.index, p=multiple_lines_dist.values)

#Internet Service
missing_internet_service = df['InternetService'].isnull().sum() / len(df) * 100
print(f"Missing InternetService Values: {missing_internet_service:.2f}%")
internet_service_dist = df['InternetService'].value_counts(normalize=True)
df.loc[df['InternetService'].isnull(), 'InternetService'] = np.random.choice(internet_service_dist.index, p=internet_service_dist.values)

#Online Security
missing_online_security = df['OnlineSecurity'].isnull().sum() / len(df) * 100
print(f"Missing OnlineSecurity Values: {missing_online_security:.2f}%")
online_security_dist = df['OnlineSecurity'].value_counts(normalize=True)
df.loc[df['OnlineSecurity'].isnull(), 'OnlineSecurity'] = np.random.choice(online_security_dist.index, p=online_security_dist.values)

#Online Backup
missing_online_backup = df['OnlineBackup'].isnull().sum() / len(df) * 100
print(f"Missing OnlineBackup Values: {missing_online_backup:.2f}%")
online_backup_dist = df['OnlineBackup'].value_counts(normalize=True)
df.loc[df['OnlineBackup'].isnull(), 'OnlineBackup'] = np.random.choice(online_backup_dist.index, p=online_backup_dist.values)
Missing Gender Values: 10.00%
Missing SeniorCitizen Values: 10.00%
Missing Partner Values: 10.00%
Missing Dependents Values: 10.00%
Missing Tenure Values: 10.00%
Missing PhoneService Values: 10.00%
Missing MultipleLines Values: 10.00%
Missing InternetService Values: 10.00%
Missing OnlineSecurity Values: 10.00%
Missing OnlineBackup Values: 10.00%

๐Ÿ›  Handling Missing Valuesยถ

๐Ÿง‘โ€๐Ÿคโ€๐Ÿง‘ Genderยถ

  • Missing values replaced with "Unknown" instead of imputing a category.
  • โœ… Why? Since gender is categorical and missing values are not predictable, it's better to keep them explicit rather than introducing bias.

๐Ÿ‘ด Senior Citizenยถ

  • Filled probabilistically based on the distribution of existing values.
  • โœ… Why? Maintains the real-world proportion instead of defaulting to a specific class.

๐Ÿ’‘ Partner & ๐Ÿผ Dependentsยถ

  • Filled probabilistically based on the existing ratio of "Yes"/"No".
  • โœ… Why? Prevents over-representing either category and ensures realistic data patterns.

๐Ÿ“Š Tenureยถ

  • Filled with the median instead of the mean.
  • โœ… Why? The median is less sensitive to outliers, ensuring a more balanced distribution.

๐Ÿ“ž Phone Service & ๐Ÿ“ถ Multiple Linesยถ

  • Filled probabilistically using the distribution of available values.
  • โœ… Why? Helps maintain the service adoption rate in the dataset.

๐ŸŒ Internet Serviceยถ

  • Filled probabilistically using the existing category proportions.
  • โœ… Why? Ensures that the distribution of different service types remains realistic.

๐Ÿ” Online Security & ๐Ÿ“ Online Backupยถ

  • Filled probabilistically based on category frequencies.
  • โœ… Why? Retains natural variations rather than over-sampling any single category.

๐Ÿ”น Why is probabilistic filling better?ยถ

  • Prevents bias โ€“ avoids over-representing any one category.
  • Mimics real-world patterns โ€“ missing data is distributed naturally.
  • More accurate predictions โ€“ models learn from a dataset that reflects actual trends.

๐Ÿš€ Now, our dataset is clean, consistent, and ready for analysis!

Inย [50]:
df.isnull().sum()
Out[50]:
gender             0
SeniorCitizen      0
Partner            0
Dependents         0
tenure             0
PhoneService       0
MultipleLines      0
InternetService    0
OnlineSecurity     0
OnlineBackup       0
Churn              0
dtype: int64
Inย [52]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   gender           10000 non-null  object 
 1   SeniorCitizen    10000 non-null  float64
 2   Partner          10000 non-null  object 
 3   Dependents       10000 non-null  object 
 4   tenure           10000 non-null  float64
 5   PhoneService     10000 non-null  object 
 6   MultipleLines    10000 non-null  object 
 7   InternetService  10000 non-null  object 
 8   OnlineSecurity   10000 non-null  object 
 9   OnlineBackup     10000 non-null  object 
 10  Churn            10000 non-null  int64  
dtypes: float64(2), int64(1), object(8)
memory usage: 859.5+ KB
Inย [54]:
##Encoding the data

# Create a LabelEncoder object for binary features
df.head()
# List of binary columns (for Label Encoding)
binary_cols = ['SeniorCitizen', 'Partner', 'Dependents', 'PhoneService']

# Apply Label Encoding to binary features
le = LabelEncoder()
for col in binary_cols:
    df[col] = le.fit_transform(df[col])

# List of categorical columns (for One-Hot Encoding)
categorical_cols = ['gender', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup']

# Apply One-Hot Encoding
df_preprocessed = pd.get_dummies(df, columns=categorical_cols, drop_first=False, dtype='int')
Inย [56]:
# Initialize MinMaxScaler
scaler = MinMaxScaler()

# Apply MinMaxScaler to the 'tenure' field and create a new column 'scaled_tenure'
df_preprocessed['scaled_tenure'] = scaler.fit_transform(df[['tenure']])

๐Ÿ”„ Data Preprocessing: Encoding & Scalingยถ

To prepare our dataset for machine learning, we need to convert categorical features into numerical form and scale numerical features for better model performance.


๐Ÿ”  Encoding Categorical Dataยถ

We apply different encoding techniques based on the feature type:

โœ” Label Encoding (For Binary Features)

  • Applied to: SeniorCitizen, Partner, Dependents, PhoneService
  • Why? These features have only two categories (Yes/No or 0/1), making label encoding the most efficient approach.

โœ” One-Hot Encoding (For Multi-Category Features)

  • Applied to: gender, MultipleLines, InternetService, OnlineSecurity, OnlineBackup
  • Why? One-hot encoding creates separate columns for each category, ensuring models correctly interpret non-ordinal data.

๐Ÿ“ Scaling Numerical Featuresยถ

โœ” MinMax Scaling (For tenure)

  • Why? Normalizes values between 0 and 1, preventing tenure from dominating other features due to its larger range.

โœ… Final Step: Our dataset is now fully encoded, normalized, and ready for model training! ๐Ÿš€

Inย [59]:
# Print confirmation
print("DataFrame `df_preprocessed` is ready for model training!")
df_preprocessed.head()
DataFrame `df_preprocessed` is ready for model training!
Out[59]:
SeniorCitizen Partner Dependents tenure PhoneService Churn gender_Female gender_Male gender_Unknown MultipleLines_No ... InternetService_DSL InternetService_Fiber optic InternetService_No OnlineSecurity_No OnlineSecurity_No internet service OnlineSecurity_Yes OnlineBackup_No OnlineBackup_No internet service OnlineBackup_Yes scaled_tenure
0 0 1 0 2.0 1 1 0 1 0 1 ... 0 1 0 1 0 0 0 1 0 0.014085
1 1 0 0 37.0 1 0 1 0 0 1 ... 0 1 0 0 0 1 0 0 1 0.507042
2 0 0 1 37.0 1 0 0 1 0 0 ... 0 1 0 1 0 0 0 1 0 0.507042
3 1 0 0 13.0 1 0 0 1 0 0 ... 0 1 0 1 0 0 0 1 0 0.169014
4 1 1 1 55.0 0 1 0 0 1 1 ... 0 1 0 0 0 1 0 0 1 0.760563

5 rows ร— 22 columns

Inย [61]:
df_preprocessed.describe()
Out[61]:
SeniorCitizen Partner Dependents tenure PhoneService Churn gender_Female gender_Male gender_Unknown MultipleLines_No ... InternetService_DSL InternetService_Fiber optic InternetService_No OnlineSecurity_No OnlineSecurity_No internet service OnlineSecurity_Yes OnlineBackup_No OnlineBackup_No internet service OnlineBackup_Yes scaled_tenure
count 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 ... 10000.000000 10000.0000 10000.000000 10000.000000 10000.000000 10000.000000 10000.00000 10000.000000 10000.000000 10000.000000
mean 0.550100 0.447700 0.548800 36.513900 0.541000 0.502000 0.449300 0.450700 0.100000 0.301900 ... 0.302000 0.4044 0.293600 0.300500 0.296600 0.402900 0.30380 0.398600 0.297600 0.500196
std 0.497509 0.497282 0.497638 19.630256 0.498341 0.500021 0.497448 0.497588 0.300015 0.459105 ... 0.459148 0.4908 0.455434 0.458498 0.456781 0.490506 0.45992 0.489635 0.457225 0.276482
min 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.0000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000
25% 0.000000 0.000000 0.000000 21.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.0000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.281690
50% 1.000000 0.000000 1.000000 37.000000 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.0000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.507042
75% 1.000000 1.000000 1.000000 52.000000 1.000000 1.000000 1.000000 1.000000 0.000000 1.000000 ... 1.000000 1.0000 1.000000 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000 0.718310
max 1.000000 1.000000 1.000000 72.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 ... 1.000000 1.0000 1.000000 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000 1.000000

8 rows ร— 22 columns

Inย [63]:
import plotly.express as px

# Drop the 'tenure' column
filtered_df = df_preprocessed.drop(columns=['tenure'])

# Convert DataFrame to long format for Plotly
df_melted = filtered_df.melt(var_name='Feature', value_name='Value')

# Create an interactive box plot with thicker elements
fig = px.box(
    df_melted, 
    x='Value', 
    y='Feature', 
    title="Box Plot of Features",
    color='Feature',  # Different colors for each feature
    color_discrete_sequence=px.colors.qualitative.Prism  # Color palette
)

# Increase thickness of box elements
fig.update_traces(
    boxmean=True,  # Show mean as a line inside the box
    marker=dict(size=6),  # Make outlier points bigger
    line=dict(width=3)  # Make box plot lines thicker
)

# Improve layout
fig.update_layout(
    xaxis_title="Value Distribution",
    yaxis_title="Features",
    width=900,
    height=500,
    font=dict(family="Arial, sans-serif", size=12, color="black"),
    margin=dict(l=100, r=50, t=50, b=50)  # Adjust margins
)

fig.show()

๐Ÿค– Machine Learning Modelsยถ

  • Creating models (Decision Tree, Logistic Regression, KNN, RandomForest Classifier)
  • Evaluating performance (Accuracy, Precision, Recall, F1-score)
  • Identifying important features for churn prediction
  • Improving model performance with hyperparameter tuning

๐Ÿ“Œ Decision Tree Classifier - Hyperparameter Tuning & Evaluationยถ

1๏ธโƒฃ Manual Hyperparameter Tuningยถ

  • A Decision Tree Classifier is trained with manually set hyperparameters.
  • The model is evaluated using Accuracy, Precision, Recall, F1-Score, and AUC-ROC to measure performance.
Inย [65]:
#Checking model building with manual tuning of hyperparameters - Decision Tree

# Decision Tree Classifier - Test Size = 0.2
x_dt = df_preprocessed.drop(['Churn', 'scaled_tenure'], axis=1)
y_dt = df_preprocessed['Churn']

# Split the data with test_size = 0.2
x_train_dt, x_test_dt, y_train_dt, y_test_dt = train_test_split(
    x_dt, y_dt, test_size=0.2, random_state=42
)

# Initialize and fit the Decision Tree Classifier with the given hyperparameters (manual tuning)
dt_clf = DecisionTreeClassifier(
    random_state=42,
    criterion='entropy',
    max_depth=7,
    min_samples_leaf=1,
    min_samples_split=2
)
dt_clf.fit(x_train_dt, y_train_dt)

# Make predictions
y_pred_dt = dt_clf.predict(x_test_dt)

# Evaluate performance
accuracy = accuracy_score(y_test_dt, y_pred_dt)
precision = precision_score(y_test_dt, y_pred_dt, pos_label=1)
recall = recall_score(y_test_dt, y_pred_dt, pos_label=1)
f1 = f1_score(y_test_dt, y_pred_dt, pos_label=1)

# Display results
print("\nResults of Decision Tree Classifier with Test Size = 0.2:")
print(f"Accuracy: {accuracy:.3f}")
print(f"Precision: {precision:.3f}")
print(f"Recall: {recall:.3f}")
print(f"F1-Score: {f1:.3f}")
Results of Decision Tree Classifier with Test Size = 0.2:
Accuracy: 0.472
Precision: 0.481
Recall: 0.494
F1-Score: 0.487

2๏ธโƒฃ Finding Best Hyperparameters with GridSearchCVยถ

  • GridSearchCV is used to identify the best combination of hyperparameters.
  • The search is performed over different values of max_depth, criterion, min_samples_split, and min_samples_leaf.
  • The model is evaluated using 5-fold cross-validation with F1-score as the scoring metric.
Inย [67]:
#Finding the best hyperparameters for the Decision tree with Grid Search CV

x_dt = df_preprocessed.drop(['Churn','scaled_tenure'], axis=1)
y_dt = df_preprocessed['Churn']

# Split data into training and testing sets
x_train_dt, x_test_dt, y_train_dt, y_test_dt = train_test_split(x_dt, y_dt, test_size=0.2, random_state=42)

# Define the refined parameter grid
param_grid = {
    'max_depth': [1, 2, 3, 5, 7],  # Avoiding 'None' since deep trees overfit
    'criterion': ['gini', 'entropy'],
    'min_samples_split': [2, 3, 5],
    'min_samples_leaf': [1, 2, 3]
}

# Initialize GridSearchCV with 5-fold cross-validation
grid_search_dt = GridSearchCV(
    estimator=DecisionTreeClassifier(random_state=42),
    param_grid=param_grid,
    scoring='f1',
    cv=5,
    verbose=1,
    n_jobs=-1
)

# Perform the grid search
grid_search_dt.fit(x_train_dt, y_train_dt)

# Retrieve the best model
best_clf = grid_search_dt.best_estimator_

# Make predictions on the test set using the best model
y_pred_dt = best_clf.predict(x_test_dt)

# Evaluate the best model
accuracy_dt = accuracy_score(y_test_dt, y_pred_dt)
precision_dt = precision_score(y_test_dt, y_pred_dt)
recall_dt = recall_score(y_test_dt, y_pred_dt)
f1_score_dt = f1_score(y_test_dt, y_pred_dt)

# Print the results
print("Best Parameters for Decision Tree Classifier:", grid_search_dt.best_params_)
print(f'Accuracy: {accuracy_dt:.3f}')
print(f'Precision: {precision_dt:.3f}')
print(f'Recall: {recall_dt:.3f}')
print(f'F1 Score: {f1_score_dt:.3f}')
Fitting 5 folds for each of 90 candidates, totalling 450 fits
Best Parameters for Decision Tree Classifier: {'criterion': 'entropy', 'max_depth': 5, 'min_samples_leaf': 3, 'min_samples_split': 2}
Accuracy: 0.476
Precision: 0.486
Recall: 0.544
F1 Score: 0.513

3๏ธโƒฃ Evaluating the Best Model on Different Test Splitsยถ

  • The best parameters from GridSearchCV are used to train and test models across different test sizes (0.1, 0.2, 0.3, 0.4).
  • The performance of each model is compared using Accuracy, Precision, Recall, F1-Score, and AUC-ROC to analyze the impact of different test splits.

๐Ÿ”น This process ensures that the model is well-optimized and generalizes effectively across different data splits.

Inย [70]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_curve, auc, classification_report, confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

# Define features and target
x_dt = df_preprocessed.drop(['Churn', 'scaled_tenure'], axis=1)
y_dt = df_preprocessed['Churn']

# Define test sizes to evaluate
test_sizes = [0.1, 0.2, 0.3, 0.4]

# Store results for each test size
results_dt = []

for test_size in test_sizes:
    # Split the data
    x_train_dt, x_test_dt, y_train_dt, y_test_dt = train_test_split(
        x_dt, y_dt, test_size=test_size, random_state=42
    )

    # Initialize and fit the Decision Tree Classifier with best parameters
    dt_clf = DecisionTreeClassifier(**grid_search_dt.best_params_)
    dt_clf.fit(x_train_dt, y_train_dt)

    # Make predictions
    y_train_pred_dt = dt_clf.predict(x_train_dt)
    y_test_pred_dt = dt_clf.predict(x_test_dt)
    y_score_dt = dt_clf.predict_proba(x_test_dt)[:, 1]  # Probability scores for ROC curve

    # Compute evaluation metrics
    train_accuracy = accuracy_score(y_train_dt, y_train_pred_dt)
    test_accuracy = accuracy_score(y_test_dt, y_test_pred_dt)
    accuracy = accuracy_score(y_test_dt, y_test_pred_dt)
    precision = precision_score(y_test_dt, y_test_pred_dt, zero_division=1)
    recall = recall_score(y_test_dt, y_test_pred_dt, zero_division=1)
    f1 = f1_score(y_test_dt, y_test_pred_dt, zero_division=1)

    # Compute ROC curve
    fpr, tpr, _ = roc_curve(y_test_dt, y_score_dt)
    roc_auc = auc(fpr, tpr)

    # Store results
    results_dt.append((test_size, accuracy, precision, recall, f1, roc_auc, train_accuracy, test_accuracy, y_test_dt, y_test_pred_dt, classification_report(y_test_dt, y_test_pred_dt), fpr, tpr))

# Display all results at the end
print("\nSummary of Results for Decision Tree Classifier:")
for i, (test_size, accuracy, precision, recall, f1, roc_auc, train_accuracy, test_accuracy, _, _, _, _, _) in enumerate(results_dt):
    if i == 1:  # Highlight the second record
        print(
            f"\033[1mTest Size: {test_size:.2f} | Accuracy: {accuracy:.3f}, Precision: {precision:.3f}, Recall: {recall:.3f}, "
            f"F1-Score: {f1:.3f}, AUC-ROC: {roc_auc:.3f}\033[0m"
        )
    else:
        print(
            f"Test Size: {test_size:.2f} | Accuracy: {accuracy:.3f}, Precision: {precision:.3f}, Recall: {recall:.3f}, "
            f"F1-Score: {f1:.3f}, AUC-ROC: {roc_auc:.3f}"
        )
Summary of Results for Decision Tree Classifier:
Test Size: 0.10 | Accuracy: 0.487, Precision: 0.506, Recall: 0.474, F1-Score: 0.490, AUC-ROC: 0.494
Test Size: 0.20 | Accuracy: 0.476, Precision: 0.486, Recall: 0.544, F1-Score: 0.513, AUC-ROC: 0.466
Test Size: 0.30 | Accuracy: 0.498, Precision: 0.512, Recall: 0.525, F1-Score: 0.518, AUC-ROC: 0.494
Test Size: 0.40 | Accuracy: 0.504, Precision: 0.522, Recall: 0.463, F1-Score: 0.491, AUC-ROC: 0.505
Inย [72]:
# Extract values for plotting from Decision Tree results
test_sizes = [r[0] for r in results_dt]
train_accuracies = [r[6] for r in results_dt]
test_accuracies = [r[7] for r in results_dt]

# Plot training vs. validation accuracy
plt.figure(figsize=(6, 4))
plt.plot(test_sizes, train_accuracies, label="Training Accuracy", marker='o', linestyle='--', color='blue')
plt.plot(test_sizes, test_accuracies, label="Validation Accuracy", marker='s', linestyle='-', color='red')

plt.xlabel("Test Size")
plt.ylabel("Accuracy")
plt.title("Training vs. Validation Accuracy for Decision Tree")
plt.legend()
plt.grid(True)
plt.show()
No description has been provided for this image
Inย [74]:
# Extract results for the first test size from Decision Tree results
(test_size, accuracy, precision, recall, f1, roc_auc, train_accuracy, test_accuracy,
 y_test_dt, y_pred_dt, classification_report_dt, fpr, tpr) = results_dt[1]

# Print classification report and AUC-ROC
print(f"Classification Report for Decision Tree (Test Size: {test_size:.2f}):\n")
print(classification_report_dt)
print(f"\nAUC-ROC: {roc_auc:.3f}")

# Plot ROC Curve
plt.figure(figsize=(6, 4))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'AUC-ROC = {roc_auc:.3f}')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title(f'Decision Tree - ROC Curve (Test Size: {test_size:.2f})')
plt.legend(loc="lower right")
plt.grid()
plt.show()

# Compute confusion matrix
conf_matrix = confusion_matrix(y_test_dt, y_pred_dt)

# Display confusion matrix
print(f"\nConfusion Matrix for Decision Tree (Test Size: {test_size:.2f}):")
print(conf_matrix)

# Visualize confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=conf_matrix, display_labels=['Not Churned', 'Churned'])
disp.plot(cmap='Blues')
plt.title(f'Confusion Matrix - Decision Tree (Test Size: {test_size:.2f})')
plt.show()
Classification Report for Decision Tree (Test Size: 0.20):

              precision    recall  f1-score   support

           0       0.46      0.41      0.43       985
           1       0.49      0.54      0.51      1015

    accuracy                           0.48      2000
   macro avg       0.48      0.48      0.47      2000
weighted avg       0.48      0.48      0.47      2000


AUC-ROC: 0.466
No description has been provided for this image
Confusion Matrix for Decision Tree (Test Size: 0.20):
[[401 584]
 [463 552]]
No description has been provided for this image

๐Ÿ“Œ Logistic Regression Classifier - Hyperparameter Tuning & Evaluationยถ

1๏ธโƒฃ Manual Hyperparameter Tuningยถ

  • A Logistic Regression model is trained with manually set hyperparameters.
  • The model is evaluated using Accuracy, Precision, Recall, F1-Score, and AUC-ROC to measure performance.
Inย [76]:
#Manual tuning of hyperparameters for Decision Logistic Regression

# Logistic Regression Classifier - Test Size = 0.2
x_lr = df_preprocessed.drop(['Churn', 'tenure'], axis=1)
y_lr = df_preprocessed['Churn']

# Split the data with test_size = 0.2
x_train_lr, x_test_lr, y_train_lr, y_test_lr = train_test_split(
    x_lr, y_lr, test_size=0.2, random_state=42
)

# Initialize and fit the Logistic Regression model
model = LogisticRegression(
    random_state=42,
    C=0.01,
    l1_ratio=0.6,
    max_iter=200,
    penalty='elasticnet',
    solver='saga'
)

model.fit(x_train_lr, y_train_lr)

# Make predictions
y_pred = model.predict(x_test_lr)

# Evaluate performance
accuracy_lr = accuracy_score(y_test_lr, y_pred)
precision_lr = precision_score(y_test_lr, y_pred, zero_division=1)  # Handle undefined precision
recall_lr = recall_score(y_test_lr, y_pred, zero_division=1)
f1_lr = f1_score(y_test_lr, y_pred, zero_division=1)

# Display results
print("\nResults of Logistic Regression Classifier with Test Size = 0.2 & Manual tuning the hyperparameters")
print(f"Accuracy: {accuracy_lr:.3f}")
print(f"Precision: {precision_lr:.3f}")
print(f"Recall: {recall_lr:.3f}")
print(f"F1-Score: {f1_lr:.3f}")
Results of Logistic Regression Classifier with Test Size = 0.2 & Manual tuning the hyperparameters
Accuracy: 0.507
Precision: 0.507
Recall: 1.000
F1-Score: 0.673

2๏ธโƒฃ Finding Best Hyperparameters with GridSearchCVยถ

  • GridSearchCV is used to optimize the penalty, C, solver, and max_iter values.
  • The search is performed using 5-fold cross-validation with F1-score as the metric.
  • Why F1-score?
    • While optimizing for Recall ensures we engage every potential churned customer, it can increase False Positives, leading to extra marketing costs.
    • F1-score balances Precision and Recall, ensuring we prioritize retention without excessive resource wastage.
Inย [88]:
#Finding the best hyperparameters for the Logistic Regression

# Finding the best hyperparameters for Logistic Regression
x_lr = df_preprocessed.drop(['Churn', 'tenure'], axis=1)
y_lr = df_preprocessed['Churn']

# Split data into training and testing sets
x_train_lr, x_test_lr, y_train_lr, y_test_lr = train_test_split(x_lr, y_lr, test_size=0.2, random_state=42)

# Define the parameter grid for Logistic Regression
# Define the parameter grid
param_grid_lr = [
    {'penalty': ['l1'], 'C': [0.05, 0.1, 1, 10], 'solver': ['liblinear'], 'max_iter': [50, 100, 200, 500]},
    {'penalty': ['l2'], 'C': [0.05, 0.1, 1, 10], 'solver': ['liblinear', 'saga'], 'max_iter': [50, 100, 200, 500]},
    {'penalty': ['elasticnet'], 'C': [0.05, 0.1, 1, 10], 'solver': ['saga'], 'l1_ratio': [0.5], 'max_iter': [50, 100, 200, 500]}
]


# Initialize GridSearchCV
grid_search_lr = GridSearchCV(
    estimator=LogisticRegression(random_state=42),
    param_grid=param_grid_lr,
    scoring='f1',
    cv=5,
    verbose=1,
    n_jobs=-1
)

# Perform the grid search
grid_search_lr.fit(x_train_lr, y_train_lr)

# Retrieve the best model from the search
best_lr_clf = grid_search_lr.best_estimator_

# Make predictions on the test set using the best model
y_pred_lr = best_lr_clf.predict(x_test_lr)

# Evaluate the best model
accuracy_lr = accuracy_score(y_test_lr, y_pred_lr)
precision_lr = precision_score(y_test_lr, y_pred_lr)
recall_lr = recall_score(y_test_lr, y_pred_lr)
f1_score_lr = f1_score(y_test_lr, y_pred_lr)

# Print the results
print("Best Parameters for Logistic Regression Classifier:", grid_search_lr.best_params_)
print(f'Accuracy: {accuracy_lr:.3f}')
print(f'Precision: {precision_lr:.3f}')
print(f'Recall: {recall_lr:.3f}')
print(f'F1 Score: {f1_score_lr:.3f}')
Fitting 5 folds for each of 64 candidates, totalling 320 fits
Best Parameters for Logistic Regression Classifier: {'C': 0.05, 'max_iter': 50, 'penalty': 'l1', 'solver': 'liblinear'}
Accuracy: 0.508
Precision: 0.515
Recall: 0.549
F1 Score: 0.531

3๏ธโƒฃ Evaluating the Best Model on Different Test Splitsยถ

  • The best parameters from GridSearchCV are used to train and test models across different test sizes (0.1, 0.2, 0.3, 0.4).
  • The performance of each model is compared using Accuracy, Precision, Recall, F1-Score, and AUC-ROC to analyze the impact of different test splits.

๐Ÿ”น This process ensures that the model is well-optimized, achieves high recall, and generalizes effectively across different data splits.

Inย [90]:
# Define features and target
x_lr = df_preprocessed.drop(['Churn', 'tenure'], axis=1)
y_lr = df_preprocessed['Churn']

# Define possible test sizes
test_sizes = [0.1, 0.2, 0.3, 0.4]

# Store results for each test size
results_lr = []

for test_size in test_sizes:
    # Split the data
    x_train_lr, x_test_lr, y_train_lr, y_test_lr = train_test_split(
        x_lr, y_lr, test_size=test_size, random_state=42
    )

    # Initialize and fit the Logistic Regression model with best parameters
    model = LogisticRegression(**grid_search_lr.best_params_)
    model.fit(x_train_lr, y_train_lr)

    # Make predictions
    y_train_pred_lr = model.predict(x_train_lr)
    y_test_pred_lr = model.predict(x_test_lr)
    y_score_lr = model.predict_proba(x_test_lr)[:, 1]  # Probability scores for ROC curve

    # Compute evaluation metrics
    train_accuracy = accuracy_score(y_train_lr, y_train_pred_lr)
    test_accuracy = accuracy_score(y_test_lr, y_test_pred_lr)
    accuracy = accuracy_score(y_test_lr, y_test_pred_lr)
    precision = precision_score(y_test_lr, y_test_pred_lr, zero_division=1)
    recall = recall_score(y_test_lr, y_test_pred_lr, zero_division=1)
    f1 = f1_score(y_test_lr, y_test_pred_lr, zero_division=1)

    # Compute ROC curve
    fpr, tpr, _ = roc_curve(y_test_lr, y_score_lr)
    roc_auc = auc(fpr, tpr)

    # Store results
    results_lr.append((test_size, accuracy, precision, recall, f1, roc_auc, train_accuracy, test_accuracy, y_test_lr, y_test_pred_lr, classification_report(y_test_lr, y_test_pred_lr),fpr, tpr))

# Display all results at the end
print("\nSummary of Results for Logistic Regression Classifier:")
for i, (test_size, accuracy, precision, recall, f1, roc_auc, train_accuracy, test_accuracy, _, _, _, _, _) in enumerate(results_lr):
    if i == 1:  # Highlight the second record
        print(
            f"\033[1mTest Size: {test_size:.2f} | Accuracy: {accuracy:.3f}, Precision: {precision:.3f}, Recall: {recall:.3f}, "
            f"F1-Score: {f1:.3f}, AUC-ROC: {roc_auc:.3f}\033[0m"
        )
    else:
        print(
            f"Test Size: {test_size:.2f} | Accuracy: {accuracy:.3f}, Precision: {precision:.3f}, Recall: {recall:.3f}, "
            f"F1-Score: {f1:.3f}, AUC-ROC: {roc_auc:.3f}"
        )
Summary of Results for Logistic Regression Classifier:
Test Size: 0.10 | Accuracy: 0.486, Precision: 0.505, Recall: 0.520, F1-Score: 0.512, AUC-ROC: 0.500
Test Size: 0.20 | Accuracy: 0.508, Precision: 0.515, Recall: 0.549, F1-Score: 0.531, AUC-ROC: 0.515
Test Size: 0.30 | Accuracy: 0.502, Precision: 0.520, Recall: 0.403, F1-Score: 0.454, AUC-ROC: 0.509
Test Size: 0.40 | Accuracy: 0.502, Precision: 0.552, Recall: 0.191, F1-Score: 0.284, AUC-ROC: 0.507
Inย [92]:
# Extract values for plotting
test_sizes = [r[0] for r in results_lr]
train_accuracies = [r[6] for r in results_lr]
test_accuracies = [r[7] for r in results_lr]

# Plot training vs. validation accuracy
plt.figure(figsize=(6, 4))
plt.plot(test_sizes, train_accuracies, label="Training Accuracy", marker='o', linestyle='--', color='blue')
plt.plot(test_sizes, test_accuracies, label="Validation Accuracy", marker='s', linestyle='-', color='red')

plt.xlabel("Test Size")
plt.ylabel("Accuracy")
plt.title("Training vs. Validation Accuracy for Logistic Regression")
plt.legend()
plt.grid(True)
plt.show()
No description has been provided for this image
Inย [94]:
# Extract results for the first test size
test_size, accuracy, precision, recall, f1, roc_auc, train_accuracy, test_accuracy, y_test_lr, y_pred_lr, classification_report_lr, fpr, tpr = results_lr[1]

# Print classification report and AUC-ROC
print(f"Classification Report for Logistic Regression (Test Size: {test_size:.2f}):\n")
print(classification_report_lr)
print(f"\nAUC-ROC: {roc_auc:.3f}")

# Plot ROC Curve
plt.figure(figsize=(6, 4))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'AUC-ROC = {roc_auc:.3f}')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title(f'Logistic Regression - ROC Curve (Test Size: {test_size:.2f})')
plt.legend(loc="lower right")
plt.grid()
plt.show()

# Compute confusion matrix
conf_matrix = confusion_matrix(y_test_lr, y_pred_lr)

# Display confusion matrix
print(f"\nConfusion Matrix for Logistic Regression (Test Size: {test_size:.2f}):")
print(conf_matrix)

# Visualize confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=conf_matrix, display_labels=['Not Churned', 'Churned'])
disp.plot(cmap='Blues')
plt.title(f'Confusion Matrix - Logistic Regression (Test Size: {test_size:.2f})')
plt.show()
Classification Report for Logistic Regression (Test Size: 0.20):

              precision    recall  f1-score   support

           0       0.50      0.47      0.48       985
           1       0.51      0.55      0.53      1015

    accuracy                           0.51      2000
   macro avg       0.51      0.51      0.51      2000
weighted avg       0.51      0.51      0.51      2000


AUC-ROC: 0.515
No description has been provided for this image
Confusion Matrix for Logistic Regression (Test Size: 0.20):
[[460 525]
 [458 557]]
No description has been provided for this image

๐Ÿ” Comparison of Logistic Regression and Decision Tree Modelsยถ

๐Ÿ’Š Performance Metricsยถ

๐Ÿ”น Metric โšก Logistic Regression ๐ŸŒณ Decision Tree
Accuracy 0.508 0.476
Precision 0.515 0.486
Recall 0.549 0.544
F1-Score 0.531 0.513
AUC-ROC 0.515 0.466

๐Ÿ”Ž Key Insightsยถ

โœ… 1. Logistic Regression is the Stronger Modelยถ

  • Higher Accuracy (0.508 vs. 0.476) โ†’ Better at overall classification.
  • Higher Precision (0.515 vs. 0.486) โ†’ Reduces false positives, leading to more reliable predictions.
  • Higher AUC-ROC (0.515 vs. 0.466) โ†’ Better at distinguishing between classes.

๐ŸŒŸ 2. Decision Tree Still Captures a Lot of Churnersยถ

  • Recall is close (0.544 vs. 0.549) โ†’ Detects nearly as many true churners as Logistic Regression.
  • Lower precision means it produces more false positives, which may not be ideal for cost-sensitive decisions.

โš–๏ธ 3. Logistic Regression Has a Better F1-Score (0.531 vs. 0.513)ยถ

  • More balanced between precision and recall, making it the better overall classifier.

๐Ÿ Final Verdictยถ

๐Ÿ”น ๐Ÿ† Logistic Regression is the clear winner, with superior accuracy, precision, recall, and AUC-ROC.
๐Ÿ”น ๐ŸŒณ Decision Tree may still be useful when prioritizing recall, but it struggles with precision and overall performance.
๐Ÿ”น Both models can be improvedโ€”we should explore feature engineering, hyperparameter tuning, and advanced models.

๐Ÿš€ Next Steps:
Weโ€™ll now evaluate K-Nearest Neighbors and Ensemble Methods to improve model performance!

Assignment part 2ยถ

๐Ÿ“Œ K-Nearest Neighbors Classifier - Evaluationยถ

Inย [97]:
# K-Nearest Neighbors Classifier

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, classification_report, roc_curve, auc, confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

# Define features and target
x_knn = df_preprocessed.drop(['Churn', 'tenure'], axis=1)
y_knn = df_preprocessed['Churn']

# Define possible test sizes
test_sizes = [0.1, 0.2, 0.3, 0.4]

# Store results for each test size
results_knn = []

for test_size in test_sizes:
    # Split the data
    x_train_knn, x_test_knn, y_train_knn, y_test_knn = train_test_split(
        x_knn, y_knn, test_size=test_size, random_state=42
    )

    # Initialize the KNN model with specified hyperparameters (Hyperparameters are taken from the grid search)
    knn_model = KNeighborsClassifier(
        metric='euclidean',
        n_neighbors=5,
        weights='distance'
    )

    # Fit the model
    knn_model.fit(x_train_knn, y_train_knn)

    # Make predictions
    y_pred_knn = knn_model.predict(x_test_knn)
    y_score_knn = knn_model.predict_proba(x_test_knn)[:, 1]

    # Evaluate performance
    accuracy = accuracy_score(y_test_knn, y_pred_knn)
    precision = precision_score(y_test_knn, y_pred_knn, zero_division=1)
    recall = recall_score(y_test_knn, y_pred_knn, zero_division=1)
    f1 = f1_score(y_test_knn, y_pred_knn, zero_division=1)
    auc_roc = roc_auc_score(y_test_knn, y_score_knn)
    classification_report_knn = classification_report(y_test_knn, y_pred_knn)

    # Compute ROC curve
    fpr, tpr, _ = roc_curve(y_test_knn, y_score_knn)

    # Store the results
    results_knn.append((test_size, accuracy, precision, recall, f1, auc_roc, y_test_knn, y_pred_knn, classification_report_knn, fpr, tpr))

# Display all results at the end
print("\nSummary of Results for K-Nearest Neighbors Classifier:")
for i, (test_size, accuracy, precision, recall, f1, auc_roc, _, _, _, _, _) in enumerate(results_knn):
    if i == 1:  # Highlight the second record
        print(
            f"\033[1mTest Size: {test_size:.2f} | Accuracy: {accuracy:.3f}, Precision: {precision:.3f}, Recall: {recall:.3f}, F1-Score: {f1:.3f}, AUC-ROC: {auc_roc:.3f}\033[0m")
    else:
        print(
            f"Test Size: {test_size:.2f} | Accuracy: {accuracy:.3f}, Precision: {precision:.3f}, Recall: {recall:.3f}, F1-Score: {f1:.3f}, AUC-ROC: {auc_roc:.3f}")
Summary of Results for K-Nearest Neighbors Classifier:
Test Size: 0.10 | Accuracy: 0.489, Precision: 0.508, Recall: 0.487, F1-Score: 0.498, AUC-ROC: 0.500
Test Size: 0.20 | Accuracy: 0.495, Precision: 0.503, Recall: 0.495, F1-Score: 0.499, AUC-ROC: 0.497
Test Size: 0.30 | Accuracy: 0.490, Precision: 0.504, Recall: 0.489, F1-Score: 0.496, AUC-ROC: 0.490
Test Size: 0.40 | Accuracy: 0.491, Precision: 0.507, Recall: 0.490, F1-Score: 0.499, AUC-ROC: 0.494
Inย [99]:
# Extract values for plotting
test_sizes = [r[0] for r in results_knn]
accuracies = [r[1] for r in results_knn]

# Plot training vs. validation accuracy
plt.figure(figsize=(6, 4))
plt.plot(test_sizes, train_accuracies, label="Training Accuracy", marker='o', linestyle='--', color='blue')
plt.plot(test_sizes, test_accuracies, label="Validation Accuracy", marker='s', linestyle='-', color='red')

plt.xlabel("Test Size")
plt.ylabel("Accuracy")
plt.title("Accuracy vs. Test Size for KNN")
plt.legend()
plt.grid(True)
plt.show()
No description has been provided for this image
Inย [101]:
# Extract results for the first test size from KNN results
test_size, accuracy, precision, recall, f1, auc_roc, y_test_knn, y_pred_knn, classification_report_knn, fpr, tpr = results_knn[1]

# Print the classification report and AUC-ROC
print("Classification Report for K-Nearest Neighbors (Test Size: {:.2f}):\n".format(test_size))
print(classification_report_knn)
print("\nAUC-ROC: {:.3f}".format(auc_roc))

plt.figure(figsize=(6, 4))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'AUC-ROC = {auc_roc:.3f}')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('K-Nearest Neighbors - ROC Curve (Test Size: {:.2f})'.format(test_size))
plt.legend(loc="lower right")
plt.grid()
plt.show()

# Compute confusion matrix
conf_matrix = confusion_matrix(y_test_knn, y_pred_knn)

# Display confusion matrix
print("\nConfusion Matrix for K-Nearest Neighbors (Test Size: {:.2f}):".format(test_size))
print(conf_matrix)

# Visualize confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=conf_matrix, display_labels=['Not Churned', 'Churned'])
disp.plot(cmap='Blues')
plt.title('Confusion Matrix - K-Nearest Neighbors (Test Size: {:.2f})'.format(test_size))
plt.show()
Classification Report for K-Nearest Neighbors (Test Size: 0.20):

              precision    recall  f1-score   support

           0       0.49      0.50      0.49       985
           1       0.50      0.49      0.50      1015

    accuracy                           0.49      2000
   macro avg       0.50      0.50      0.49      2000
weighted avg       0.50      0.49      0.50      2000


AUC-ROC: 0.497
No description has been provided for this image
Confusion Matrix for K-Nearest Neighbors (Test Size: 0.20):
[[488 497]
 [513 502]]
No description has been provided for this image

๐Ÿ“Œ Ensemble Method - Random Forest Classifier - Evaluationยถ

Inย [103]:
# Random Forest Classifier (Ensemble Method)

# Feature matrix and target variable
x_rf = df_preprocessed.drop(['Churn', 'tenure'], axis=1)
y_rf = df_preprocessed['Churn']

# Define possible test sizes
test_sizes = [0.1, 0.2, 0.3, 0.4]

# Store results for each test size
results_rf = []

for test_size in test_sizes:
    # Split the data
    x_train_rf, x_test_rf, y_train_rf, y_test_rf = train_test_split(
        x_rf, y_rf, test_size=test_size, random_state=42
    )

    # Initialize Random Forest model with specified hyperparameters with manual tuning
    rf_model = RandomForestClassifier(
        n_estimators=100,  # Number of trees in the forest
        max_depth=None,  # Maximum depth of the tree (None means nodes expand until all leaves are pure)
        random_state=42,  # Random seed for reproducibility
        bootstrap=True,  # Bagging enabled
    )

    # Fit the model on the training data
    rf_model.fit(x_train_rf, y_train_rf)

    # Predict on training and test data
    y_train_pred_rf = rf_model.predict(x_train_rf)
    y_test_pred_rf = rf_model.predict(x_test_rf)

    # Evaluate performance
    train_accuracy = accuracy_score(y_train_rf, y_train_pred_rf)
    test_accuracy = accuracy_score(y_test_rf, y_test_pred_rf)
    accuracy = accuracy_score(y_test_rf, y_test_pred_rf)
    precision = precision_score(y_test_rf, y_test_pred_rf, zero_division=1)
    recall = recall_score(y_test_rf, y_test_pred_rf, zero_division=1)
    f1 = f1_score(y_test_rf, y_test_pred_rf, zero_division=1)
    classification_report_rf = classification_report(y_test_rf, y_test_pred_rf)

    # Compute and plot AUC-ROC curve
    y_score_rf = rf_model.predict_proba(x_test_rf)[:, 1]
    fpr, tpr, _ = roc_curve(y_test_rf, y_score_rf)
    roc_auc = auc(fpr, tpr)


    # Store results
    results_rf.append((test_size, accuracy, precision, recall, f1, roc_auc, train_accuracy, test_accuracy, y_test_rf, y_test_pred_rf, classification_report_rf))

# Display all results at the end
print("\nSummary of Results for Random Forest Classifier (Bagging):")
for i, (test_size, accuracy, precision, recall, f1, roc_auc, train_accuracy, test_accuracy, y_test_rf, y_test_pred_rf, classification_report_rf) in enumerate(results_rf):
    if i == 1:  # Highlight the second record
        print(
            f"\033[1mTest Size: {test_size:.2f} | Accuracy: {accuracy:.3f}, Precision: {precision:.3f}, Recall: {recall:.3f}, F1-Score: {f1:.3f}, AUC-ROC: {roc_auc:.3f}\033[0m"
        )
    else:
        print(
            f"Test Size: {test_size:.2f} | Accuracy: {accuracy:.3f}, Precision: {precision:.3f}, Recall: {recall:.3f}, F1-Score: {f1:.3f}, AUC-ROC: {roc_auc:.3f}"
        )
Summary of Results for Random Forest Classifier (Bagging):
Test Size: 0.10 | Accuracy: 0.483, Precision: 0.502, Recall: 0.493, F1-Score: 0.498, AUC-ROC: 0.491
Test Size: 0.20 | Accuracy: 0.483, Precision: 0.491, Recall: 0.482, F1-Score: 0.486, AUC-ROC: 0.494
Test Size: 0.30 | Accuracy: 0.489, Precision: 0.503, Recall: 0.481, F1-Score: 0.492, AUC-ROC: 0.494
Test Size: 0.40 | Accuracy: 0.493, Precision: 0.509, Recall: 0.485, F1-Score: 0.497, AUC-ROC: 0.499
Inย [104]:
# Extract values for plotting
test_sizes = [r[0] for r in results_rf]
train_accuracies = [r[6] for r in results_rf]
test_accuracies = [r[7] for r in results_rf]

# Plot training vs. validation accuracy
plt.figure(figsize=(6, 4))
plt.plot(test_sizes, train_accuracies, label="Training Accuracy", marker='o', linestyle='--', color='blue')
plt.plot(test_sizes, test_accuracies, label="Validation Accuracy", marker='s', linestyle='-', color='red')

plt.xlabel("Test Size")
plt.ylabel("Accuracy")
plt.title("Training vs. Validation Accuracy for Random Forest")
plt.legend()
plt.grid(True)
plt.show()
No description has been provided for this image
Inย [107]:
# Extract results for the first test size
test_size, accuracy, precision, recall, f1, roc_auc, train_accuracy, test_accuracy, y_test_rf, y_test_pred_rf, classification_report_rf = results_rf[1]

# Print the classification report and AUC-ROC
print("Classification Report for Random forest (Test Size: {:.2f}):\n".format(test_size))
print(classification_report_rf)
print("\nAUC-ROC: {:.3f}".format(roc_auc))


plt.figure(figsize=(6, 4))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'AUC-ROC = {roc_auc:.3f}')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Random forest - ROC Curve (Test Size: {:.2f})'.format(test_size))
plt.legend(loc="lower right")
plt.grid()
plt.show()

# Compute confusion matrix
conf_matrix = confusion_matrix(y_test_rf, y_test_pred_rf)

# Display confusion matrix
print("\nConfusion Matrix for Random forest (Test Size: {:.2f}):".format(test_size))
print(conf_matrix)

# Visualize confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=conf_matrix, display_labels=['Not Churned', 'Churned'])
disp.plot(cmap='Blues')
plt.title('Confusion Matrix - Random forest (Test Size: {:.2f})'.format(test_size))
plt.show()
Classification Report for Random forest (Test Size: 0.20):

              precision    recall  f1-score   support

           0       0.48      0.49      0.48       985
           1       0.49      0.48      0.49      1015

    accuracy                           0.48      2000
   macro avg       0.48      0.48      0.48      2000
weighted avg       0.48      0.48      0.48      2000


AUC-ROC: 0.494
No description has been provided for this image
Confusion Matrix for Random forest (Test Size: 0.20):
[[478 507]
 [526 489]]
No description has been provided for this image
Inย [111]:
# Performance compare of each models

# Data
models = ['Decision Tree', 'Logistic Regression', 'KNN', 'Random Forest']
metrics = ['Accuracy', 'Precision', 'Recall', 'F1-Score', 'AUC-ROC']

# Extract the second (index 1) results for each model and round to 3 decimal places
data = {
    'Decision Tree': [round(metric, 3) for metric in results_dt[1][1:6]],  # Skip the test size (1st item)
    'Logistic Regression': [round(metric, 3) for metric in results_lr[1][1:6]],  # Skip the test size
    'KNN': [round(metric, 3) for metric in results_knn[1][1:6]],  # Skip the test size
    'Random Forest': [round(metric, 3) for metric in results_rf[1][1:6]]  # Skip the test size
}

print(data)

# Print the created data dictionary
# print("Extracted Data (Rounded to 3 Decimal Places):")
# print(data)

# Convert to DataFrame
df_models = pd.DataFrame(data, index=metrics)

# Transform DataFrame into long format for Plotly
df_melted = df_models.reset_index().melt(id_vars='index', var_name='Model', value_name='Score')
df_melted.rename(columns={'index': 'Metric'}, inplace=True)

# Plot using Plotly
fig = px.histogram(
    df_melted,
    x='Metric',  # Metrics on the x-axis
    y='Score',  # Scores on the y-axis
    color='Model',  # Grouped by models
    barmode='group',  # Bars grouped side-by-side
    title='Model Performance Comparison',  # Title of the chart
    color_discrete_sequence=px.colors.qualitative.Prism  # Define color palette
)

# Customize layout
fig.update_layout(
    xaxis_title='Evaluation Metrics',
    yaxis_title='Score',
    width=1000,
    height=500,
    legend_title='Models'
)

# Show the interactive plot
fig.show()
{'Decision Tree': [0.476, 0.486, 0.544, 0.513, 0.466], 'Logistic Regression': [0.508, 0.515, 0.549, 0.531, 0.515], 'KNN': [0.495, 0.503, 0.495, 0.499, 0.497], 'Random Forest': [0.483, 0.491, 0.482, 0.486, 0.494]}

๐Ÿ“Š Customer Churn Prediction Model Evaluationยถ

๐Ÿš€ Problem Statementยถ

Predicting customer churn is crucial for telecom companies to retain customers and reduce revenue loss. Churn occurs when customers discontinue services, impacting business sustainability. By accurately predicting churn, companies can implement targeted retention strategies such as personalized offers, better customer service, and proactive engagement.


๐Ÿ“‰ Overall Model Performanceยถ

All models exhibit relatively low performance, with accuracy scores hovering around 50%. This suggests potential challenges in the dataset, such as: ๐Ÿ”น High noise โ€“ irrelevant or inconsistent data
๐Ÿ”น Weak predictive features โ€“ limited strong indicators of churn
๐Ÿ”น Class imbalance โ€“ disproportionate churn vs. non-churn cases

However, even slight improvements over random guessing (50%) can translate into significant business impact, making these insights valuable for retention efforts.


๐Ÿค– Models Evaluated & Metricsยถ

We trained and tested four machine learning models to predict churn:

โœ” Decision Tree
โœ” Logistic Regression
โœ” K-Nearest Neighbors (KNN)
โœ” Random Forest

Each model was evaluated using:

  • Accuracy โ†’ Overall correctness of the model.
  • Precision โ†’ Percentage of predicted churners that actually churned.
  • Recall โ†’ Percentage of actual churners correctly identified.
  • F1-Score โ†’ Balances precision and recall for overall effectiveness.
  • AUC-ROC โ†’ Measures the modelโ€™s ability to distinguish churners from non-churners.

๐Ÿ“Š Performance Comparisonยถ

Model Accuracy Precision Recall F1-Score AUC-ROC
Decision Tree 0.476 0.486 0.544 0.513 0.466
Logistic Regression 0.508 0.515 0.549 0.531 0.515
KNN 0.495 0.503 0.495 0.499 0.497
Random Forest 0.483 0.491 0.482 0.486 0.494

๐Ÿ† Best Model Selection โ€“ Logistic Regressionยถ

Among the evaluated models, Logistic Regression achieves the highest accuracy (0.508), recall (0.549), and AUC-ROC (0.515), making it the best performer.

๐Ÿ”น Why Logistic Regression?ยถ

โœ… Highest Recall (0.549) and AUC-ROC (0.515)
โœ” Outperforms Decision Tree, KNN, and Random Forest in distinguishing churners from non-churners.

โœ… Best Balance Between Precision and Recall
โœ” Ensures a good trade-off between correctly identifying churners and minimizing false positives.

โœ… More Consistent Performance
โœ” Unlike KNN and Random Forest, which have lower recall, Logistic Regression captures more churners effectively.


๐Ÿ’ก Business Impactยถ

Churn prediction is a trade-off between recall and precision:

๐Ÿ”น Decision Tree prioritizes recall, meaning it catches more churners but misclassifies more loyal customers, increasing unnecessary interventions.
๐Ÿ”น Logistic Regression balances recall and precision, making it a reliable alternative.
๐Ÿ”น KNN and Random Forest perform worse overall, with lower recall and accuracy.


โš–๏ธ Considerations & Next Stepsยถ

โœ” If recall is the top priority, Decision Tree remains a good choice.
โœ” For interpretability and computational efficiency, Logistic Regression is preferable.
โœ” Exploring ensemble methods and feature engineering could further improve predictions.


๐Ÿ”ฎ Conclusionยถ

For customer churn prediction, Logistic Regression emerges as the best model due to its superior accuracy, recall, and AUC-ROC score. Further refinements in feature selection and hyperparameter tuning could improve overall performance for better business impact.